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Abstract. A game-theoretic model of scrip (artificial currency) systems is analyzed. It is 
shown that relative entropy can be used to characterize the distribution of agent wealth when 
all agents use threshold strategies — that is, they volunteer to do work iff they have below a 
threshold amount of money. Monotonicity of agents' best-reply functions is used to show 
that scrip systems have pure strategy equilibria where all agents use threshold strategies. 
An algorithm is given that can compute such an equilibrium and the resulting distribution 
of wealth. 
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1. Introduction. Historically, non-governmental organizations have is- 
sued their own currencies for a wide variety of purposes. These currencies, 
known as scrip, have been used in company towns where government issued 
currency was scarce [44 J , in Washington DC to reduce the robbery rate of bus 
drivers [37], and in Ithaca (New York) to promote fairer pay and improve the 
local economy [22]. Scrip systems are also becoming more prevalent in online 
systems. 

To give some examples, market-based solutions using scrip systems have 
been suggested for applications such as system-resource allocation [33], man- 
aging replication and query optimization in a distributed database [12], and 
allocating experimental time on a wireless sensor network test bed [9] ; a num- 
ber of sophisticated scrip systems have been proposed [I9j EU [46] to allow 
agents to pool resources while avoiding what is known as free riding, where 
agents take advantage of the resources the system provides while providing 
none of their own (as Adar and Huberman [1] have shown, this behavior cer- 
tainly takes place in systems such as Gnutella); and Yootles [38J uses a scrip 
system as a way of helping groups make decisions using economic mechanisms 
without involving real money. 

In this paper, we provide a formal model in which to analyze scrip sys- 
tems. We describe a simple scrip system and show that, under reasonable 
assumptions, for each fixed amount of money there is a nontrivial equilibrium 
involving threshold strategies^ where an agent accepts a request if he has less 
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than $£; for some threshold £;J1 

An interesting aspect of our analysis is that, in equilibrium, the distribu- 
tion of users with each amount of money is the distribution that minimizes 
relative entropy to an appropriate distribution (subject to the money sup- 
ply constraint). This allows us to use techniques from statistical mechanics 
to explicitly compute the distribution of money and thus agents' best-reply 
functions. 

The analysis shows that agents' best-reply functions are monotone: if all 
other agents use a higher threshold, then the best reply has a higher threshold. 
This makes our game one with what Milgrom and Roberts |32j call strategic 
complementarities. Using results of Tarski [33], Topkis [35] showed that there 
are pure strategy equilibria in such games and that there is a simple algorithm 
to find these equilibria. 

In a companion paper |28j . we use our analysis of this model to answer 
questions of interest to system designers. For example, we examine how the 
quantity of money effects the efficiency of the equilibrium and show that it 
is maximized by maintaining the appropriate ratio between the total amount 
of money and the number of agents. This ratio can be found by increasing 
the money supply up to the point that the system experiences a "monetary 
crash," where money is sufficiently devalued that no agent is willing to perform 
a service. We also incorporate agents altruistically providing service, hoarding 
money, creating multiple identities, and colluding into the model. 

The rest of the paper is organized as follows. In Section [21 we review 
related work. Then in Section [31 we present the formal model. We analyze the 
distribution of money in this model when agents are using threshold strategies 
in Section [3J and show that it is characterized by relative entropy. Using this 
analysis, we show in Section [5] that, under minimal assumptions, there is a 
nontrivial equilibrium where all agents use threshold strategies. These results 
apply to a sufficiently large population of agents after a sufficiently long period 
of time, so in Section [6] we use simulations to demonstrate that these values 
are reasonable in practice. We conclude in Section [71 

2. Related Work. Scrip systems have a long history in computer sci- 
ence, with two main thrusts: resource allocation and free-riding prevention. 
Early applications for resource allocation include agoric systems [33], which 
envisioned solving problems such as processor scheduling using markets, and 
Mariposa [12], a market-driven query optimizer for distributed databases. 
More recently, scrip systems have been used to allocate the resources of re- 
search testbeds. Examples include Mirage [9] for wireless sensor networks, 
Bellagio [5] for PlanetLab, and Egg [8] for grid computing. Virtual markets 
have been used to coordinate the activity of nodes of a sensor network [30J. 

optimal for them to do so. 

2 Although we refer to our unit of scrip as the dollar, these are not real dollars, nor do we 
view "scrip dollars" as convertible to real dollars. 
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Yootles [38] uses a scrip to help people make everyday decisions, such as where 
to have lunch, without involving real money. 

Systems that use scrip to prevent free riding include KARMA [36], which 
provides a general framework for P2P networks. Gupta et al. [19] propose 
what they call a "debit-credit reputation computation" for P2P networks, 
which is essentially a scrip system. Fileteller [21] uses payments in a network 
file storage system. Dandelion [41] uses scrip in a content distribution setting. 
Belenkiy et al. [6] consider how a Bit Torrent-like system can make use of e- 
cash. Antfarm [35] uses scrip to optimize content distribution across a number 
of BitTorrent-like swarms. 

Despite this tremendous interest in scrip systems, there has been relatively 
little work studying how they behave. While there has been extensive work 
in macroeconomics on the effect of variables such as the amount of money in 
circulation on the economy (see, for example, the relevant chapters of [7]), this 
work focuses on goals such as minimizing inflation and maximizing employ- 
ment that are not directly relevant to a system designer. 

Chun et al. [34] studied user behavior in a deployed scrip system and ob- 
served that users tried various (rational) manipulations of the auction mech- 
anism used by the system. Their observations suggest that system designers 
will have to deal with game-theoretic concerns. 

Hens et al. |20j do a theoretical analysis of what can be viewed as a scrip 
system in a related model. There are a number of significant differences be- 
tween the models. First, in the Hens et al. model, there is essentially only one 
type of agent, but an agent's utility for getting service (our j t ) may change 
over time. Thus, at any time, there will be agents who differ in their util- 
ity. In our model, agents want service only occasionally, so, at each round, 
we assume that one agent is chosen (by nature) to request a service, while 
other agents decide whether or not to provide it. (As the number of agents 
increases, the time between rounds decreases, so as to keep this assumption 
reasonable.) In the Hens et al. model, every agent always desires service, so, 
at each round, each agent decides whether to provide service, request service, 
or opt out, as a function of his utilities and the amount of money he has. They 
assume that there is no (direct) cost for providing service and everyone is able 
to do so. However, they do assume that agents cannot simultaneously provide 
and receive service, an assumption that is typically unreasonable in a peer- 
to-peer system. Under this assumption, a system with optimal performance 
is one where half the agents request service and the other half are willing to 
provide it. Despite these differences, Hens et al. also show that agents will 
use a threshold strategy. However, although they have inefficient equilibria, 
because there is no cost for providing service, their model does not exhibit the 
monetary crashes that our model can exhibit. 

Aperjis and Johari [4] examine a model of a P2P filesharing system as an 
exchange economy. They associate a price (in bandwidth) with each file and 
find a market equilibrium in the resulting economy. Later work by Aperjis et 



4 



I. A. KASH ET AL. 



al. does use a currency, but the focus remains on establishing market prices [3] 
The ultimate goal of a scrip system is to promote cooperation. While 
there is limited theoretical work on scrip systems, there is a large body of 
work on cooperation. Much of the work involves a large group of agents being 
randomly matched to play a game such as prisoner's dilemma. Such models 
were studied in the economics literature \25\ [T2] and first applied to online 
reputations in |15j ; Feldman et al. [T3] apply them to P2P systems. 

These models fail to capture important asymmetries in the interactions of 
the agents. When a request is made, there are typically many people in the 
network who can potentially satisfy it (especially in a large P2P network), but 
not all can. For example, some people may not have the time or resources to 
satisfy the request. The random-matching process ignores the fact that some 
people may not be able to satisfy the request. (Presumably, if the person 
matched with the requester could not satisfy the match, he would have to 
defect.) Moreover, it does not capture the fact that the decision as to whether 
to "volunteer" to satisfy the request should be made before the matching 
process, not after. That is, the matching process does not capture the fact 
that if someone is unwilling to satisfy the request, there will doubtless be others 
who can satisfy it. Finally, the actions and payoffs in the prisoner's dilemma 
game do not obviously correspond to actual choices that can be made. For 
example, it is not clear what defection on the part of the requester means. 
Our model addresses all these issues. 

Scrip systems are not the only approach to preventing free riding. Two 
other approaches often used in P2P networks are barter and reputation sys- 
tems. Perhaps the best-known example of a system that uses barter is BitTor- 
rent [10] : where clients downloading a file try to find other clients with parts 
they are missing so that they can trade, thus creating a roughly equal amount 
of work. Since the barter is restricted to users currently interested in a single 
file, this works well for popular files, but tends to have problems maintain- 
ing availability of less popular ones. An example of a barter-like system built 
on top of a more traditional file-sharing system is the credit system used by 
eMule. Each user tracks his history of interactions with other users and gives 
priority to those he has downloaded from in the past. However, in a large sys- 
tem, the probability that a pair of randomly-chosen users will have interacted 
before is quite small, so this interaction history will not be terribly helpful. 
Anagnostakis and Greenwald [2] present a more sophisticated version of this 
approach, but it still seems to suffer from similar problems. More recently, 
Piatek et al. [36J have proposed a model based on including intermediaries a 
single hop away; this model is more liquid than barter but not as liquid as a 
full scrip system. 

A number of attempts have been made at providing general reputation 
systems (e.g. [181 032 12H EH] ) • The basic idea is to aggregate each user's expe- 
rience into a global number for each individual that intuitively represents the 
system's view of that individual's reputation. However, these attempts tend 
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to suffer from practical problems because they implicitly view users as either 
"good" or "bad" , assume that the "good" users will act according to the spec- 
ified protocol, and that there are relatively few "bad" users. Unfortunately, if 
there are easy ways to game the system, once this information becomes widely 
available, rational users are likely to make use of it. We cannot count on only 
a few users being "bad" (in the sense of not following the prescribed protocol). 
For example, Kazaa uses a measure of the ratio of the number of uploads to 
the number of downloads to identify good and bad users. However, to avoid 
penalizing new users, they gave new users an average rating. Users discovered 
that they could use this relatively good rating to free ride for a while and, 
once it started to get bad, they could delete their stored information and ef- 
fectively come back as a "new" user, thus circumventing the system (see [2] 
for a discussion and [15] for a formal analysis of this "whitewashing"). Thus, 
Kazaa's reputation system is ineffective. 

3. The Model. Before specifying our model formally, we give an intuitive 
description of what our model captures. While our model simplifies a number 
of features (as does any model), we believe that it provides useful insights. We 
model a scrip system where, as in a P2P filesharing system, agents provide 
each other with service. There is a single service (such as file uploading) 
that agents occasionally want. In practice, at any given time, a number of 
agents will want service but, to simplify the formal description and analysis, 
we model the scrip system as proceeding in a series of rounds where, in each 
round, a single agent wants service (the time between rounds will be adjusted 
to capture the growth in parallelism as the number of agents grows) H In each 
round, after an agent requests service, other agents have to decide whether 
they want to volunteer to provide service. However, not all agents may be 
able to satisfy the request (not everyone has every file). While, in practice, 
the ability of agents to provide service at various times may be correlated for 
a number of reasons (if I don't have the file today I probably still don't have 
it tomorrow; if one agents does not have a file, it may be because it is rare, 
so that should increase the probability that other agents do not have it), for 
simplicity, we assume that the events of an agent being able to provide service 
in different rounds or two agents being able to provide service in the same or 
different rounds are independent. While our analysis relies on this assumption 
so that we can describe the system using a Markov chain, we expect that 
our results would still hold as long these correlations are sufficiently small. If 
there is at least one volunteer, someone is chosen from among the volunteers 
(at random) to satisfy the request. Our model allows some agents to be more 
likely to be chosen (perhaps they have more bandwidth) but does not allow 
an agent to specify which agent is chosen. Allowing agents this type of control 

3 For large numbers of agents, our model converges to one in which agents make requests in 
real time, and the time between an agent's requests is exponentially distributed. In addition, 
the time between requests served by a single player is also exponentially distributed. 
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would break the symmetries we use to characterize the long run behavior of 
the system and create new opportunities for strategic behavior by agents that 
are beyond the scope of this paper. The requester then gains some utility (he 
got the file) and the volunteer loses some utility (he had to use his bandwidth 
to upload the file) , and the requester pays the volunteer a fee that we fix at one 
dollar. As is standard in the literature, we assume that agents discount future 
payoffs. This captures the intuition that utility now is worth more than utility 
tomorrow, and allows us to compute the total utility derived by an agent in 
an infinite game. The amount of utility gained by having a service performed 
and the amount lost by performing it, as well as many other parameters may 
depend on the agent. 

More formally, we assume that agents have a type t drawn from some finite 
set T of types. We can describe the entire population of agents using the pair 
(T, /), where / is a vector of length \T\ and ft is the fraction with type t. For 
most of the paper, we consider only what we call standard agents. These are 
agents who derive no pleasure from performing a service, and for whom money 
has no intrinsic value. Thus, for a standard agent, there is no direct connection 
between money (measured in dollars) and utility (measured in utils). We can 
characterize the type of a standard agent by a tuple t = (at, fit, It, St, Pt,Xt), 
where 

• at > reflects the cost for an agent of type t to satisfy a request; 

• < fit < 1 is the probability that an agent of type t can satisfy a 
request; 

• 7t > at is the utility that an agent of type t gains for having a request 
satisfied; 

• < St < 1 is the rate at which an agent of type t discounts utility; 

• pt > represents the (relative) request rate (some people want files 
more often than others). For example, if there are two types of agents 
with pt x = 2 and pt 2 = 1 then agents of the first type will make requests 
twice as often as agents of the second type. Since these request rates are 
relative, we can multiply them all by a constant to normalize them. 
To simplify later notation, we assume the pt are normalized so that 

Yjtar Ptft = i; 

• xt > represents the (relative) likelihood of an agent to be chosen when 
he volunteers (some uploaders may be more popular than others). In 
particular, this means the relative probability of two given agents being 
chosen is independent of which other agents volunteer; and 

• ijj t = fitXt/pt is not part of the tuple, but is an important derived 
parameter that, as we will see in Section helps determine how much 
money an agent will have. 

We occasionally omit the subscript t on some of these parameters when it is 
clear from context or irrelevant. 

Representing the population of agents in a system as (T, /) captures the 
essential features of a scrip system we want to model: there are a large number 
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of agents who may have different types. Note that some tuples (T, /) may be 
incompatible with there being some number N of agents. For example, if there 
are two types, and / says that half of the agents are of each type, then there 
cannot be 101 agents. Similar issues arise when we want to talk about the 
amount of money in a system We could deal with this problem in a number 
of ways (for example, by having each agent determine his type at random 
according to the distribution /). For convenience, we simply do not consider 
population sizes that are incompatible with /. This is the approach used in 
the literature on N -replica economies [31] . 

Formally, we consider games specified by a tuple (T, f,h,m,n), where T 
and / are as defined above, h € N is the base number of agents of each type, 
n £ N is number of replicas of these agents and m € K + is the average amount 
of money. The total number of agents is thus hn. We ensure that the fraction 
of agents of type t is exactly ft and that the average amount of money is exactly 
m by requiring that fth £ N and mh G N. Having created a base population 
satisfying these constraints, we can make an arbitrary number of copies of 
it. More precisely, we assume that agents . . . ft x h — 1 have type t\, agents 
f tl h . . . (ft 1 +ft 2 )h—l have type t^, and so on through agent h — 1. These base 
agents determine the types of all other agents. Each agent j £ {h, . . . , hn — 1} 
has the same type as j mod h; that is, all the agents of the form j + kh for 
k = 1, . . . , n — 1 are replicas of agent j. 

We also need to specify how money is initially allocated to agents. Our 
results are based on the long-run behavior of the system and so they turn out 
to hold for any initial allocation of money. For simplicity, at the start of the 
game we allocate each of the hmn dollars in the system to an agent chosen 
uniformly at random, but all our results would hold if we chose any other 
initial distribution of money. 

To make precise our earlier informal description, we describe (T, /, h, m, n) 
as an infinite extensive-form game. A non-root node in the game tree is as- 
sociated with a round number (how many requests have been made so far), 
a phase number, either 1, 2, 3 , or 4 (which describes how far along we are 
in determining the results of the current request), a vector x where Xi is the 
current amount of money agent i has, and Yli x i = rnhn, and, for some nodes, 
some additional information whose role will be made clear below. We use r(i) 
to denote the type of agent i. 

• The game starts at a special root node, denoted A, where nature moves. 
Intuitively, at A, nature allocates money uniformly at random, so it 
transitions to a node of the form (0, l,x): round zero, phase one, and 
allocation of money x, and each possible transition is equally likely. 

• At a node of the form (r, 1, x), nature selects an agent to make a request 
in the current round. Agent i is chosen with probability p T t^/hn. If i 
is chosen, a transition is made to (r, 2, x, i). 

• At a node of the form (r, 2, x, i), nature selects the set V of agents (not 
including i) able to satisfy the request. Each agent j ^ i is included 
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in V with probability (3 T (j\- If V is chosen, a transition is made to 
(r,S,x,i,V). 

• At a node of the form (r,3,x,i,V), each agent in V chooses whether 
to volunteer. If V' is the set of agents who choose to volunteer, a 
transition is made to (r, 4, x, i, V'). 

• At a node of the form (r,4,x,i,V), if V ^ 0, nature chooses a sin- 
gle agent in V' to satisfy the request. Each agent j is chosen with 
probability Xr(j)/ Ylj'ev Xr(j')- ^3 ^ s chosen, a transition is made to 
(r + 1, 1, x f ), where 

{Xj — 1 if i = j and Xj > 0, 
Xj + 1 if j is chosen by nature and Xj > 0, 
xj otherwise. 

If V' = 0, nature has no choice; a transition is made to (r + 1, l,x) 

with probability 1. 
A strategy for agent j describes whether or not he will volunteer at every 
node of the form (r, 3, x, i, V) such that j £ V. (These are the only nodes where 
j can move.) We also need to specify what agents know when they make their 
decisions. To make our results as strong as possible, we allow an agent to base 
his strategy on the entire history of the game, which includes, for example, the 
current wealth of every other agent. As we show, even with this unrealistic 
amount of information, available, it would still be approximately optimal to 
adopt a simple strategy that requires little information — specifically, agents 
need to know only their current wealth. That means that our results would 
continue to hold as long as agents knew at least this information. A strategy 
profile 5 consists of one strategy per agent. A strategy profile S determines a 
probability distribution over paths Pr^ in the game tree. Each path determines 
the value of the following random two variables: 

• the amount of money agent i has during round r, defined as the 
value of Xi at the nodes with round number r and 

• ul, the utility of agent i for round r. If i is a standard agent, then 

(7 T (j) if a node (r, 4, x, i, V) is on the path with V ^ 
— a r (j) if i is chosen by nature at node (r, 4, x, j, V) 
otherwise. 

Ui(S), the total expected utility of agent i if strategy profile S is played, 
is the discounted sum of his per round utilities u£, but the exact form of the 
discounting requires some explanation. In our model, only one agent makes a 
request each round. As the number of agents increases, an agent has to wait 
a larger number of rounds to make requests, so naively discounting utility 
would mean his utility decreases as the number of agents increases, even if all 
of his requests are satisfied. This is an artifact of our model breaking time into 
discrete rounds where a single agent makes a request. In reality, many agents 
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make requests in parallel, and how often an agent desires service typically does 
not depend on the number of agents. It would be counterintuitive to have a 
model that says that if agents make requests at a fixed rate and they are all 
satisfied, then their expected utility depends on the number of other agents. 
As the followinglemma shows, there is a unique discount rate that removes 
this dependencyo 

Lemma 3.1. With a discount rate of (1 — (1 — 5t)/n), an agent of type t 's 
expected discounted utility for having all his requests satisfied is independent of 
the number of replicas n. Furthermore, this is the unique such rate such that 
the discount rate is St when n = 1. 

Proof. The agent makes a request each round with probability pt/hn, so 
his expected discounted utility for having all his requests satisfied is 

oo 

X> - (1 - 5t)lnY(pa t /{hn)) = ( mt /(H)/(l " (1 " (1 " *)/»)) 

r=0 

= ( P at/h)/{\ - St) 

This is independent of n and satisfies (1 — (1 — St)/1) = St as desired. It is 
unique because choosing any other discount rate for some n will cause the 
value of the sum to differ from (pt'jt/h)/^ — St) for that n. fl 

As is standard in economics, for example in the folk theorem for repeated 
games [E], we multiply an agent's utility by (1 — St) so that his expected 
utility is independent of his discount rate as well. With these considerations 
in mind, the total expected utility of agent i given the vector of strategies S is 

oo 

Ui0) = (1 - «y T(i) ) X> - (1 - S T{{] )/nYE s [ul], (3.1) 

r=0 

In modeling the game this way, we have implicitly made a number of 
assumptions. For example, we have assumed that all of agent i's requests that 
are satisfied give agent i the same utility, and that prices are fixed. We discuss 
the implications of these assumptions in Section [7J 

Our solution concept is the standard notion of an approximate Nash equi- 
librium. As usual, given a strategy profile S and agent i, we use (S^S-i) to 
denote the strategy profile that is identical to S except that agent i uses S^. 

Definition 3.2. A strategy for agent i is an e-best reply to a strategy 
profile S-i for the agents other than i in the game (T, / ,h,m,n) if, for all 
strategies S'/ , 

4 In preliminary versions of this work we used the discount rate of &\^ n . This rate captures 
the intuitive idea of making the time between rounds 1/n, but results in an agent's utility 
depending on the number of other agents, even if all the agent's requests are satisfied. 
However, in the limit as St goes to 1, agents' normalized expected utilities (multiplied by 
1 — St as in Equation 13. 1|) are the same either discount rate, so our main results hold with 
the discount rate S\^ n as well. 
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Definition 3.3. A strategy profile S for the game (T, f,h,m,n) is an 
e-Nash equilibrium if for all agents i, Si is an e-best reply to S~{. A Nash 
equilibrium is an epsilon-Nash equilibrium with e = 0. 

As we show in Section [51 (T, f ,h,m,n) has equilibria where agents use a 
particularly simple type of strategy called a threshold strategy. Intuitively, an 
agent with "too little" money will want to work, to minimize the likelihood of 
running out due to making a long sequence of requests before being able to 
earn more money. On the other hand, a rational agent with plenty of money 
will think it is better to delay working, thanks to discounting. These intuitions 
suggest that the agent should volunteer if and only if he has less than a certain 
amount of money. Let Sk be the strategy where an agent volunteers if and 
only if the requester has at least 1 dollar and the agent has less than k dollars. 
Note that sq is the strategy where the agent never volunteers. While everyone 
playing so is a Nash equilibrium (nobody can do better by volunteering if no 
one else is willing to), it is an uninteresting one. 

We frequently consider the situation where each agent of type t uses the 
same threshold st t - In this case, a single vector k suffices to indicate the 
threshold of each type, and we can associate with this vector the strategy 
S(k) where S(k)i = s^,^ (i.e., agent i of type r(i) uses threshold & T (i))- 

For the rest of this paper, we focus on threshold strategies (and show why 
it is reasonable to do so). In particular, we show that, if all other agents use 
threshold strategies, it is approximately optimal for an agent to use one as well. 
Furthermore there exist Nash equilibria where agents do so. While there are 
potentially other equilibria that use different strategies, if a system designer 
has agents use threshold strategies by default, no agent will have an incentive 
to change. Since threshold strategies have such low information requirements, 
they are a particularly attractive choice for a system designer as well for the 
agents, since they are so easy to play. 

When we consider the threshold strategy S(k), for ease of exposition, we 
assume in our analysis that mhn < f^kthn. To understand why, note that 
mhn is the total amount of money in the system. If mhn > Y2t ft^thn, then 
if the agents use a threshold S(k), the system will quickly reach a state where 
each agent has kt dollars, so no agent will volunteer. This is equivalent to all 
agents using a threshold of 0, and similarly uninteresting. 

4. Analyzing the Distribution of Wealth. Our main goal is to show 
that there exists an approximate equilibrium where all agents play threshold 
strategies. In this section, we examine a more basic question: if all agents play 
a threshold strategy, what happens? We show that there is some distribution 
over money (i.e., a distribution that describes what fraction of people have 
each amount of money) such that the system "converges" to this distribution 
in a sense to be made precise shortly. 

To motivate our interest in the distribution, consider an agent i who is 
trying to decide on a best response in a setting where all other agents are 



AN EQUILIBRIUM ANALYSIS OF SCRIP SYSTEMS 



11 



playing a threshold strategy, and all agents of a particular type play the same 
strategy. Specifically, suppose that agent i has $k, and is trying to decide 
whether to volunteer. Assume that the system is sufficiently large that one 
agent's decision does not affect the distribution. To figure out whether to 
volunteer, agent i must compute how likely he is to run out of money before 
he gets a chance to make another dollar. Clearly this depends on how much 
money he has. But it also depends on how likely he is to be chosen when 
he volunteers, which, in turn, depends on how many other volunteers there 
are. Our results show that there is a distribution of money d* such that, with 
extremely high probability, the actual distribution is almost always extremely 
close to d* . By knowing d* , the agent will know what fraction of agents of each 
type t have each amount of money. If all the agents of type t use the same 
threshold strategy, he will also know how many agents of type t volunteer. 
Moreover, this number will be essentially the same at every round. This will 
enable him to figure out when he should volunteer. 

We remark that, in addition to providing an understanding of system be- 
havior that underpins our later results, this result also provides a strong guar- 
antee about the stability of the economy. It shows that we do not have wild 
swings of behavior; in particular, the fraction of agents volunteering is essen- 
tially constant. 

Suppose that all agents of each type t use the same threshold kt, so we can 
write the vector of thresholds as k. For simplicity, assume that each agent has 
at most kt dollars. We can make this assumption with essentially no loss of 
generality, since if someone has more than k t dollars, he will just spend money 
until he has at most kt dollars. After this point he will never acquire more 
than kt- Thus, eventually the system will be in a state where, for all types t, 
no agent of type t has more than kt dollars. 

We are interested in the vectors a? that can be observed in round r (recall 
that x\ is the amount of money that agent i has at round r). By assumption, 
if agent i has type r(i), then x\ € {0, . . . , k T ^}. In addition, since the total 
amount of money is hmn, 

^ G X T,f,h,m,n,k = i f G ^ I yLXi - Yl Xi = hmn }- 

i 

The evolution of x r can be described by a Markov chain Ai T ^ hmn ^ over 
the state space fhmnk' ^ or Drev fty> we refer to the Markov chain and 
state space as M. and X, respectively, when the subscripts are clear from 
context. It is possible to move from state s to state s' in a single round if, by 
choosing a particular agent i to make a request and another agent j to satisfy 
it, i's amount of money in s' is 1 more than in s; j's amount of money in 
s' is 1 less than in s\ and all other agents have the same amount of money 
in s and s'. Therefore, the probability of a transition from a state x to y is 
unless there exist two agents i and j such that yv = xy for all i! ^ {i,j}, 
iji = Xi + 1, and jjj = Xj — 1. In this case, the probability of transitioning 
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from x to y is the probability of j being chosen to make a request and i being 
chosen to satisfy it. Let A^ m g denote the set of probability distributions d 

on Uter{t} x Jlti^' • • • > ^} suc h that for all types t, Ylt=o ^(*> = /*■ We can 
think of <i(t, I) as the fraction of agents that are of type t and have I dollars. 
We can associate each state x with its corresponding distribution d x . This 
is a useful way of looking at the system, since we typically just care about 
the fraction of people with each amount of money, not the amount that each 
particular agent has. We show that, if n is large, then there is a distribution 
d* G A t t such that, after a sufficient amount of time, the Markov chain Ai 

j ,m,k ' ' 

is almost always in a state x such that d x is close to d*. Thus, agents can 
base their decisions about what strategy to use on the assumption that they 
will be in a state where the distribution of money is essentially d* . Note that, 
since agents discount future utility, the transient behavior of the Markov chain 
does matter, but by making St sufficiently large (i.e., if agents are sufficiently 
patient) the effect on utility can be made arbitrarily small. 

We can in fact completely characterize the distribution d*. Given two 
distributions d, q G ^f m let 

H(d\\q)= Yl d(t,j)logd(t,j)/q(t,j) 

(t,j)s.t.q(t,j)¥=0 

denote the relative entropy of d relative to q (H(d\\q) = oo if d(t,j) = and 
q(t,j) ^ or vice versa); this is also known as the Kullback-Leibler divergence 
of q from d If A is a closed convex set of distributions, then it is well 
known that, for each q, there is a unique distribution in A that minimizes the 
relative entropy to q. Since A^* m g is easily seen to be a closed convex set of 
distributions, in particular, this is the case for A^ m g. We now show that there 
exists a q such that, for n sufficiently large, the Markov chain M. is almost 
always in a state x such that d x is close to the distribution d* - E A? r 

J qj,m f,m,k 

that minimizes entropy relative to q. (We omit some or all of the subscripts on 
d* when they are not relevant.) The statement is correct under a number of 
senses of "close". For defmiteness, we consider the Euclidean distance. Given 
e > and q, let X T * hmn ^ £ q (or X eig , for brevity) denote the set of states 

f G X T,f,h,m,n,k SUch that E(tJ) i) " d * q ? < ^ 

Let Iq^ e be the random variable that is 1 if d x € X Ej q, and otherwise. 

Theorem 4.1. For all games (T, f, h, m, 1), all vectors k of thresholds, 
and all e > 0, there exist q G ^f m £ and n e such that, for all n > n e , there 
exists a round r* such that, for all r > r* , we have Vi{I q n e = 1) > 1 — e. 

The proof of Theorem 14.11 can be found in Appendix [Aj One interesting 
special case of the theorem is when there exist /3, x> an d p such that for 
all types t, (3 t = (3, Xt = X, and pt = p. In this case q is the distribution 
q(t,j) = f t /(k t + l) (i.e., q is uniform within each type t). We sketch the proof 
for this special case here. 
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Proof. (Sketch) Using standard techniques, we can show that our Markov 
Chain has a limit distribution ir such that for all y, lim r _-. 00 Pr(x r = y) = ir(y). 
Let Tgg denote the probability of transitioning from state x to state y. It is 
easily verified by an explicit computation of the transition probabilities that 
(in this special case) Tgg = T$g. It is well known that this symmetry implies 
that 7r is the uniform distribution [39J. Thus, after a sufficient amount of time, 
the distribution of x r will be arbitrarily close to uniform. 

Since, for large r, Pr(ar = y) is approximately 1 / \X\, the probability of 
a? being in a set of states is the size of the set divided by the total number 
of states. Using a straightforward combinatorial argument, it can be shown 
that the fraction of states not in X £<q is bounded by p(n)/e cn , where p is a 
polynomial. This fraction goes to as n gets large. Thus, for sufficiently large 
n, Pr(I£ n>£ = 1) > 1 - e. C 

The last portion of the proof sketch is actually a standard technique from 
statistical mechanics that involves showing that there is a concentration phe- 
nomenon around the maximum entropy distribution [23J. To illustrate what 
we mean by a concentration phenomenon, consider a system with only two 
dollars. With n agents, there are 0(n 2 ) ways to assign the dollars to differ- 
ent agents and O(n) ways to assign them to the same agent. If each way of 
assigning the two dollars to agents is equally likely, we are far more likely to 
see a distribution of money where two agents have one dollar each than one 
where a single agent has two dollars. In the special case considered in the proof 
sketch, when tt is the uniform distribution, the number of states corresponding 
to a particular distribution d is proportional to e nH ^ (where H here is the 
standard entropy function). In general, each state is not equally likely, which 
is why the general proof in Appendix [A] uses relative entropyH 

Theorem 14.11 tells us that, after enough time, the distribution of money is 
almost always close to some d*, where d* can be characterized as a distribution 
that minimizes relative entropy subject to some constraints. In Appendix lAl 
we show that the appropriate distribution q is q(t,i) = {utY/iYlt Ej=o( £t, *) < )- 
The following lemma shows how we can compute d* from q. 

Lemma 4.2. 

d*(M)= P g(M) , (4-1) 
where A is the unique value such that 

£Xyfti) = ro. (4.2) 

t i 

The proof of Lemma 14.21 is omitted because it can be easily checked using 
Lagrange multipliers in the manner of [23j where the function to be minimized 



5 Note that in generalizing to relative entropy we switch from maximizing to minimizing; 
maximizing entropy is equivalent to minimizing relative entropy relative to the uniform 
distribution. 
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is the relative entropy of d* relative to q and the constraints are that an ft 
fraction of the agents are of type t and the average amount of money is m. 

5. Existence of Equilibria. We have seen that the system is well be- 
haved if the agents all follow a threshold strategy; we now want to show that, 
if the discount factor 5 is sufficiently large for all agents, there is a nontrivial 
approximate Nash equilibrium where they do so (that is, an approximate Nash 
equilibrium where all the agents use s/% for some k > 0.) To understand why 
we need S to be sufficiently large, note that if 5 is small, then agents have no 
incentive to work. Intuitively, if future utility is sufficiently discounted, then 
all that matters is the present, and there is no point in volunteering to work. 
Thus, for sufficiently small 5, sq is the only equilibrium. To show that there 
is a nontrivial equilibrium if the discount factor is sufficiently large, we first 
show that, if every other agent is playing a threshold strategy, then there is 
an approximate best reply that is also a threshold strategy. Furthermore, we 
show that the best-reply function is monotone; that is, if some agents change 
their strategy to one with a higher threshold, no other agent can do better 
by lowering his threshold. This makes our game one with what Milgrom and 
Roberts [32 J call strategic complementarities. Using results of Tarski |43| . Top- 
kis |45j showed that there are pure strategy equilibria in such games, since the 
process of starting with a strategy profile where everyone always volunteers 
(i.e., the threshold is oo) and then iteratively computing the best-reply profile 
to it converges to a Nash equilibrium in pure strategies. This procedure also 
provides an efficient algorithm for explicitly computing equilibria. 

To see that threshold strategies are approximately optimal, consider a sin- 
gle agent i of type t and fix the vector k of thresholds used by the other agents. 
If we assume that the number of agents is large, what an agent i does has es- 
sentially no effect on the behavior of the system (although it will, of course, 
affect that agent's payoffs). In particular, this means that the distribution q of 
Theorem 14. 1 1 characterizes the distribution of money in the system. This dis- 
tribution, together with the vector k of thresholds, determines what fraction 
of agents volunteers at each step. This, in turn, means that from the perspec- 
tive of agent i, the problem of finding an optimal response to the strategies 
of the other agents reduces to finding an optimal policy in a Markov decision 
process (MDP) V G v The behavior of the MDP V G t depends on two 
probabilities: p u and pd- Informally, p u is the probability of i earning a dollar 
during each round if is willing to volunteer, and pd is the probability that i 
will be chosen to make a request during each round. Note that p u depends on 
m, k, and t (although it turns out that pd depends only on n, the number of 
agents in the system); if the dependence of p u on m, k, and/or t is important, 

we add the relevant parameters to the superscript, writing, for example, p™' k . 
We show that the optimal policy for i in V G t is a threshold policy, and 
that this policy is an e-optimal strategy for G. Importantly, the same policy 
is optimal independent of the value of n. This allows us to ignore the exact 
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size of the system in our later analysis. 

For our results, it will be important to understand how p u , pd, and t 
affect the optimal policy for V G g^s t , and thus the e-optimal strategies in 
the game. We use this understanding in this section to show that there exist 
nontrivial equilibria in Lemma r5.4l and for a number of results in our companion 
paper [28] . 

In the following lemma, whose proof (and the relevant formal definitions) 
are deferred to Appendix [Bj Equation (|5.ip . quantifies the effects of these 
parameters. When choosing whether he should volunteer with his current 
amount of money, an agent faces a choice of whether to pay a utility cost of 
at now in exchange for a discounted payoff of 74 when he eventually spends 
the resulting dollar. His choice will depend on how much time he expects 
to pass before he spends that dollar (captured by the random variable J in 
equation 15. ip , which in turn depends on his current amount of money k and 
the probabilities p u and pd- The following lemma quantifies this calculation. 

Lemma 5.1. Consider the games G n = (T, f,h,m,n) (where T, f, h, and 
m are fixed, but n may vary). There exists a k such that for all n, s^ is an 
optimal policy for V G ggs t - The threshold k is the maximum value of n such 
that 

a t < E[{\ - (1 - 8t)/n) J{fi *«* d) Yfu (5-1) 

where J(n,p u ,pd) is a random variable whose value is the first round in which 
an agent starting with k dollars, using strategy s K , and with probabilities p u and 
Pd of earning a dollar and of being chosen given that he volunteers, respectively, 
runs out of money. 

The following theorem shows that an optimal threshold policy for V G t 
is an e-optimal strategy for G. In particular, this means that Equation (|5.ip 
allows us to understand how changing parameters affect an e-optimal strategy 
for G, not just for V G v 

Theorem 5.2. For all games G = (T, f,h,m,n), all vectors k of thresh- 
olds, and all e > 0, there exist n* and 6* n such that for all n > n*, types 
i £ T, and St > 6* n , an optimal threshold policy for V G t is an e-best reply 

to the strategy profile S{k)^i for every agent i of type t. 

We defer the proof of Theorem 15 .21 to AppendixO While, in this and later 
theorems, the acceptable values of 5* n depend on n, they are independent if, 
as we suggest in Section [H the Markov Chain from Section |4] is rapidly mixing. 

Given a game G = (T, f, h, m, n) and a vector k of thresholds, Lemma l5.ll 
gives an optimal threshold k' t for each type t. Theorem 15.21 guarantees that 
sy t is an e-best reply to S-i(k), but does not rule out the possibility of other 
best replies. However, for ease of exposition, we will call k' t the best reply to 
S-i and call BRc(k) = k' the best-reply function. The following lemma shows 
that this function is monotone (non-decreasing) . Along the way, we prove that 
several other quantities are monotone. First, we show that A m £, the value of 
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A from Lemma 14.21 given m and k, is non-decreasing in m and non- increasing 

in k. We use this to show that p™' k is non-increasing in k, which is needed to 
show the monotonicity of BRq- We defer the proof to Appendix iBl 

Lemma 5.3. Consider the family of games G m = (T, f ,h,m,n) and the 
strategies S(k), for mhn < Y2t ftkthn. For this family of games, X m ? is 

non- decreasing in m and non-increasing in k; p™ ,k is non- decreasing in m 
and non-increasing in k; and the function BRq is non- decreasing in k and 
non-increasing in m. 

The intuition behind monotonicity is easy to explain: if the agents other 
than agent i use a higher threshold, then they will volunteer more often. Thus, 
agent i is less likely to be chosen when he volunteers, and thus he will need to 
volunteer more often (and so use a higher threshold himself). Monotonicity is 
enough to guarantee the existence of an equilibrium. We actually know that 
there is an equilibrium even without the use of monotonicity. If all agents 
choose a threshold of $0, so no agent ever volunteers, then clearly i's best 
response is also never to volunteer; getting a dollar is useless if it can never be 
spent. Fortunately, we can use monotonicity to show that there is a nontrivial 
equibrium in threshold strategies as well. Indeed, to guarantee the existence of 
a nontrivial equilibrium, it suffices to show there is some vector k of thresholds 
such that BRc{k) > k. The following lemma, whose proof is again deferred to 
Appendix [Bj shows that we can always find such a point for sufficiently large 

s t . 

Lemma 5.4. For all games G = (T, f,h,m,n), there exists a 5* < 1 such 
that if St > 5* for all t, there is a vector k of thresholds such that BRc(k) > k. 

We are now ready to prove our main theorem: there exists a non-trivial 
equilibrium where all agents play threshold strategies greater than zero. 

Theorem 5.5. For all games G = (T, f, h, m, 1) and all e, there exist 
n* and 5* n such that, if n > n* and St > S* n for all t, then there exists 

a nontrivial vector k of thresholds that is an e-Nash equilibrium. Moreover, 
there exists a greatest such vector. 

Proof. By Lemma [5. 31 BRq is a non-decreasing function on a complete lat- 
tice, so Tarski's fixed point theorem [33] guarantees the existence of a greatest 
and least fixed point; these fixed points are equilibria. The least fixed point is 
the trivial equilibrium. We can compute the greatest fixed point by starting 
with the strategy profile (oo, . . . , oo) (where each agent uses the strategy Sqo of 
always volunteering) and considering e-best-reply dynamics, that is, iteratively 
computing the e-best-reply strategy profile. Monotonicity guarantees this pro- 
cess converges to the greatest fixed point, which is an equilibrium (and is bound 
to be an equilibrium in pure strategies, since the best reply is always a pure 
strategy). Since there is a finite amount of money, this process needs to be re- 
peated only a finite number of times. By Lemma f5.4l there exists a k such that 
BRc(k) > k. Monotonicity then guarantees that BRc{BRG{k)) > BRc(k) 
and similarly for any number of applications of BRq- If k* is the greatest 
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fixed point of BRq, then k* > k. Thus, the greatest fixed point is a nontrivial 
equilibrium. |3 

The proof of Theorem 15.51 also provides an algorithm for finding equilibria 
that seems efficient in practice: start with the strategy profile (oo, . . . , oo) and 
iterate the best-reply dynamics until an equilibrium is reached. 




5 10 15 20 25 

Strategy ol Rest of Agents 



Fig. 5.1. A hypothetical best-reply function with one type of agent. 

There is a subtlety in our results. In general, there may be many equilibria. 
From the perspective of social welfare, some will be better than others. As 
we show in our companion paper, strategies that use smaller (but nonzero) 
thresholds increase social welfare. Consider the best-reply function shown in 
Figure I5TT1 In the game G in the example, there is only one type of agent, so 
BRq ■ N — > N. In equilibrium, we must have must have BR(k) = k; that is, 
an equilibrium is characterized by a point on the line y = x. This example 
has three equilibria, where all agents play so, s$, and sio respectively. The 
strategy profile where all agents play S5 is the equilibrium that maximizes 
social welfare, while sio is the greatest equilibrium. 

In the rest of this paper, we focus on the greatest equilibrium in all our 
applications (although a number of our results hold for all nontrivial equilib- 
ria). This equilibrium has several desirable properties. First, it is guaranteed 
to be stable; best-reply dynamics from nearby points converge to it. By way 
of contrast, best-reply dynamics moves the system away from the equilibrium 
S5 in Figure I5TT1 Unstable equilibria are difficult to find in practice, and seem 
unlikely to be maintained for any length of time. Second, the "greatest" equi- 
librium is the one found by the natural algorithm given in Theorem 15.51 The 
proof of the theorem shows that it is also the outcome that will occur if agents 
adopt the reasonable initial strategy of starting with a large threshold and 
then using best-reply dynamics. Finally, by focusing on the worst nontrivial 
equilibrium, our results provide guarantees on social welfare, in the same way 
that results on price of anarchy [40j provide guarantees (since price of anar- 
chy considers the social welfare of the Nash equilibrium with the worst social 
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welfare) . 

6. Simulations. Theorem 14, 1 1 proves that, for a sufficiently large number 
n of agents, and after a sufficiently large number r of rounds, the distribution 
of wealth will almost always be close to the distribution that minimizes relative 
entropy. In this section, we simulate the game to gain an understanding of how 
large n and r need to be in practice. The simulations show that our theoretical 
results apply even to relatively small systems; we get tight convergence with 
a few thousand agents, and weaker convergence for smaller numbers, in very 
few rounds rounds, indeed, a constant number per agent. 



5000 10000 15000 20000 25000 
Number of Agents 

Fig. 6.1. Maximum distance from minimum relative entropy distribution over 10 6 
timesteps. 

The first simulation explores the tightness of convergence to the distribu- 
tion that minimizes relative entropy for various values of n. We used a single 
type of agent, with ft = p = \ = \, m = 2, and k = 5. For each value of n, the 
simulation was started with a distribution of money as close as possible to the 
distribution d* that minimizes relative entropy to the distribution q defined 
in Theorem 14.11 that characterizes the distribution of money in equilibrium 
(when the threshold strategy 5 is used). We then computed the maximum 
Euclidean distance between d* and the observed distribution over 10 6 rounds. 
As Figure 16.11 shows, the system does not move far from d* once it is there. 
For example, if n = 5000, the system is never more than distance .001 from 
d* . If n = 25,000, it is never more than .0002 from d* . 

Figure I6TT1 does show a larger distance for n = 1000, although in absolute 
terms it is still small. The next simulation shows that, while the system may 
occasionally move away from d* , it quickly converges back to it. We averaged 
10 runs of the Markov chain, starting from an extreme distribution (every 
agent has either $0 or $5), and considered the average time needed to come 
within various distances of d* . As Figure [6T21 shows, after 2 rounds per agent, 
on average, the Euclidean distance from the average distribution of money to 
d* is .008; after 3 rounds per agent, the distance is down to .001. 
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Fig. 6.2. Distance from minimum relative entropy distribution with 1000 agents. 
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Fig. 6.3. Average time to get within .001 of the minimum relative entropy distribution. 



Finally, we considered more carefully how quickly the system converges to 
d* for various values of n. There are approximately k n possible states, so the 
convergence time could in principle be quite large. However, we suspect that 
the Markov chain that arises here is rapidly mixing, which means that it will 
converge significantly faster (see [29] for more details about rapid mixing) . We 
believe that the actually time needed is 0{n). This behavior is illustrated in 
Figure 16.31 which shows that for our example chain (again averaged over 10 
runs), after approximately 3n steps, the Euclidean distance between the actual 
distribution of money in the system and d* is less than .001. This suggests 
that we should expect the system to converge in a constant number of rounds 
per agent. 

7. Discussion. We have given a formal analysis of a scrip system and 
have shown that approximate equilibria exist in threshold strategies and that 
the distribution of money in these equilibria is given by relative entropy. As 
part of our equilibrium argument, we have shown that the best-reply function 
is monotone. This proves the existence of equilibria in pure strategies and 
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permits efficient algorithms to compute these equilibria. 

Our model makes a number of assumptions that are worthy of further 
discussion. Some of the simplifying assumptions can be relaxed without sig- 
nificant changes to our results (albeit at the cost of greater strategic and 
analytic complexity). At a high level, our results show the system converges 
to a steady state when agents follow simple threshold strategies and that there 
is in fact an equilibrium in these strategies. If, for example, rather than all 
requests having the same value to agent (74), the value of a request is stochas- 
tic, agents might wish to have thresholds for each type of request. This would 
allow an agent to forgo a low- valued request if he is low on money. This makes 
the space of agent strategies larger and significantly complicates the proofs in 
the appendix, but this high-level characterization still holds. 

The most significant assumption we make is that prices are fixed. However, 
our results provide insight even if we relax this assumption. With variable 
prices, the behavior of the system depends on the value of j3, the probability 
that an agent can satisfy a request. For large j3, where are a large number 
of agents who can satisfy each request, we expect the resulting competition 
to effectively produce a fixed price, so our analysis applies directly. For small 
f3, where there are few volunteers for each request, variable prices can have a 
significant impact. 

However, allowing prices to be set endogenously, by bidding, has a number 
of negative consequences. For one thing, it removes the ability of the system 
designer to optimize the system using monetary policy. In addition, for small 
/?, it is possible for colluding agents to form a cartel to fix prices on a resource 
they control. It also greatly increases the strategic complexity of using the 
system: rather than choosing a single threshold, agents need an entire pricing 
scheme. Finally, the search costs and costs of executing a transaction are likely 
to be higher with floating prices. Thus, we believe that adopting a fixed price 
or a small set of fixed prices is often a reasonable design decision. 
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Appendix A. Proof of Theorem 14.11 

Given a Markov chain M over a state space X and state s€5, let iX - be 
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the random variable that is 1 if A4 is in state y at time r and the chain started 
in state x and otherwise. Then linij._-.oo Pr(2X - = 1) is the limit probability of 
being in state y given that the Markov chain starts in state x. In general, this 
limit does not exist. However, there are well-known conditions under which 
the limit exists, and is independent of the initial state x. A Markov chain is 
said to be irreducible if every state is reachable from every other state; it is 
aperiodic if, for every state x, there exist two cycles from x to itself such that 
the gcd of their lengths is 1. 

Theorem A.l. [39] If M is a finite, irreducible, and aperiodic Markov 
chain over state space X, then there exists a d : X — > R such that, for all x 
and y £ X, hm r _>oo Pr (^,y = x ) = d (v)- 

Thus, if we can show that A4 is finite, irreducible, and aperiodic, then the 
limit distribution exists and is independent of the start state x. This is shown 
in the following lemma. 

Lemma A. 2. If there are at least three agents, then M is finite, irreducible, 
and aperiodic and therefore has a limit distribution tt. 

Proof. Ai is clearly finite since X is finite. We prove that it is irreducible 
by showing that state y is reachable from state x by induction on the distance 
w = Y17=i \ x i ~ Vi\ (i- e -> the sum of the absolute differences in the amount of 
money each person has in states x and y). If w = 0, then x = y so we are 
done. Suppose that w > and all pairs of states that are less that w apart 
are reachable from each other. Consider a pair of states x and y such that 
the distance from x to y is w. Since w > and the total amount of money 
is the same in all states, there must exist i\ and ii such that Xi 1 > y^ and 
Xi 2 < yi 2 . Thus, in state y, i\ is willing to work (since he has strictly less than 
the threshold amount of money) and 12 has money to pay him (since 12 has 
a strictly positive amount of money). The state z that results from i\ doing 
work for ^2 in state y is of distance w — 2 from x. By the induction hypothesis, 
z is reachable from x. Since y is clearly reachable from z, y is reachable from 
x. 

Finally, we must show that A4 is aperiodic. Suppose x is a state such that 
there exist three agents i\, 12, and i^ where i\ has more than dollars and 12 
and Z3 have less than their threshold amount of money. There must be such 
a state by our assumption that mhn < Y^t ftkthn. Clearly there is a cycle 
of length 2 from x to itself: 12 does work for i\ and then 12 does work for 
i\. There is also a cycle of length 3: 12 does work for i\, ij, does work for 12, 
then i\ does work for ^3. By irreducibility, identifying a single state with this 
property is sufficient. |H 

We next give an explicit formula for the limit distribution. Recall that in 
the special case discussed in the main text, /3t, Xt, and pt were the same for all 
types, so the transition probabilities were symmetric and the limit distribution 
was uniform. While with more general values they are no longer symmetric, 
they still have significant structure that allows us to give a concise description 
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of the limit distribution. 

Lemma A. 3. For all states x of M, let wg = rii(A-(i)XT(i) / ' Pr{i)) Xi '> an d 
let Z = J2y w y- Then the limit distribution of M. is tt(x) = w$/Z. 

Proof. Define tt by taking tt(x) = wg/Z, where wg and Z are as in the 
statement of the lemma. If T%g is the probability of transitioning from state x 
to state y, it is well known that it suffices to show that tt satisfies the detailed 
balance condition [39], i.e., ir(x)Tgg = n(y)Tgg for all states x and y and tt is 
a probability measure. The fact that tt is a probability measure is immediate 
from its definition. To check the first condition, let x and ybe adjacent states 
such that y is reached from x by i spending a dollar and j earning a dollar. This 
means that for the transition from x to y to happen, i must be chosen to spend 
a dollar and j must be able to work and chosen to earn the dollar. Similarly 
for the reverse transition to happen, j must be chosen to spend a dollar and i 
must be able to work and chosen to earn the dollar. All other agents have the 
same amount of money in each state, and so will make the same decision in 
each state. Thus the probabilities associated with each transition differ only 
in the relative likelihoods of i and j being chosen at each point. These may 
differ for three reasons: one might be more likely to be able to satisfy requests 
(/3), to want to make requests (p), or to be chosen to satisfy requests (x). 
Thus, for some p, which captures the effect of other agents volunteering on the 
likelihood of i and j being chosen, we can write the transition probabilities as 
Txy = Pr{i)Pr{j)Xr{j)P and Ty*g = p T (j)Pr(i)Xr(i)P- From the definition of tt, we 
have that 

TT{x) _ fi T {i)XT(i)PT{j) _ Ty^_ 
n {y) Pr(j)/?r(j)Ar(j) T xy 

Thus, tt(x)T^ = 7r(?7)Ty£, as desired, d 

Note that for the special case considered in the main text, Lemma IA.3I 
shows that the limit distribution is the uniform distribution. 

The limit distribution tells us the long run probability of being in a given 
state. Theorem 14.11 does not mention states directly, but rather the distribu- 
tions of money associated with a state. In order to prove the theorem, we 
need to know the probability of being in some state associated with a given 
distribution. This is established in the following lemma. 

Lemma A. 4. Let tt be the limit distribution from Lemma \A.c\ and let 
V(d) = H(d) — H(f) — log Z + Y^t ^2i=o id(t, i) log uit (where H is the standard 
entropy function; that is, H{d) = i d(t, i) log d(t, i) ). For all d G m £, ei- 
ther tt({x \d s = d}) = or F(hn)e hnV W < tt{{x \ d* = d}) < G(hn)e hnV( - d l 
where F and G are polynomials. 

Proof. Before computing the probability of being in such a state, we first 
compute the number of states. It is possible that there is no state x such that 
d = d x (e.g., if hn is odd and d has half the agents with dollars). If there 
is such a state x, each such state has hnd(t, i) agents of type t with i dollars. 
Thus, the number of states x with d = d is the number of ways to divide the 
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agents into groups of these sizes. Since there are hnft agents of type t, the 
number of such states is 

n( hnft 
\hnd(t,0),... ,hnd(t,k t 

To complete the proof, we use the fact (shown in the proof of Lemma 3.11 of 
PU) that 

1 hnft.H(dt') ^ I hnf 



F{hn) 



hnf t H(d t ) < ( hnf t \ hnf t H(d t ) 

- \hnd(t, 0), . . . , hnd(t, k t ) J ~ { } 



where F and G are polynomial in hn, and dt is the distribution restricted 
to a single type t (i.e., dt(i) = d(t,i)/ ^2 i d(t,i)). The (generalized) grouping 
property [llj of entropy allows us to express H(d) in terms of the entropy of 
the distributions for each fixed t, or the H(dt). Because ft = Ylid(t,i), this 
has the particularly simple form H(d) = H(f) + ^2 t ftH(dt). Thus, up to a 
polynomial factor, the number of such states is 



n 



e 

i 



hnf t H(d t ) _ p hn(J2 t f t H(d t )) _ hn(H(d)-H(f) 



By Lemma [A.31 each of theses states has the same probability tt(x). Thus, 
dropping the superscript x on d x for brevity, the probability of being in such 
a state is: 

e hn(H { d)-H ( f) )7T{s) = e hn{H(d)-HU))^ iXt/pifl/z 

i 

= e hn(H(d)-H{f)) z -hnYT^ ti yi 



e hn(H(d)-H(f)) z -hnY^Y[(u; t ] 



hnid(t,i) 



t 4=0 



kt 

e hn{H{d)-H(f)) z -hn TT TT e hnid(t,i) logcut 



t i=0 

e hn(H(d)-H(f)-log Z+£ t Eilo '<*(*>») logo;*) 



_ e hnV(d) 

c 

Theorem 14.11 savs that there exists ag£ ^fm k (** e '> a probability distri- 
bution on agent types t and amounts of money i) with certain properties. We 
now define the appropriate q. Let 
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It is not immediately clear why this is the right choice of q. As the following 
lemma shows, this definition allows us to characterize the distribution that 
maximizes the probability of being in a state corresponding to that distribution 
(as given by Lemma |A.4|) in terms of relative entropy. 

Lemma A. 5. The unique maximum of V(d) = H{d) — H(f) — \ogZ + 

EtEfeo^Og^ 071 A f,rn,k occurs at d *q- 

Proof. For brevity, we drop the superscript x on d and let Y = ^2 t Ylii^tY '■ 

kt 

argmax rf V '(d) = argmax rf (if (d) — H(f) — log Z + id(t, i) log ut) 

t i=0 

kt 

= &xgvasx d (H(d) + ^ ^ id(t, i) log u t ) 
t i=0 

kt 

= argmax d [— i) log <i(i , i ) + , i) log u/t] 

i i=0 
kt 

= argmax d ^2^2[-d(t, i) log d(t,i) + i) log(g(i, i)Y)] 
t i=0 
kt 

= argmax d ^ [~d(t, i) log d(t,i) + d(t,i) log q(t,i) + d(t, i) log Y] 

t i=0 

fct 

= argmax d log Y + ^ i) log i) + i) log i)] 

t 4 = 

= argmin d ^ i) log d(t, i) - d(t, i) log q(t, i)] 

t i=0 

v-^ 7/ -\ , d(t, i) 
= argmin d 2^ 2^ z ) lo S ~7JT7\ 
t i=o 91 ' J 

= argmin d if(d||g). 

By definition, <i* minimizes H(d\\q). It is unique because H (and thus V) 
is a strictly concave function on a closed convex set. d 

Lemma IA.5I tells us that the most likely distributions of money to be 
observed are those with low relative entropy to q. Among all distributions 
in A^ mk , relative entropy is minimized by d*. However, given n, it is quite 

possible that d* is not d x for any x. For example, if d*(t, i) = 1/3 for some t 
and i, but fthn = 16, then d x (t, i) = d*(t, i) only if exactly 16/3 agents of type 
t to have i dollars, which cannot be the case. However, as the following lemma 
shows, for sufficiently large n, we can always find a d x that is arbitrarily close 
to d*. For convenience, we use the 1-norm as our notion of distance. 

Lemma A. 6. For all e, there exists n e such that, if n > n e , then for some 
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state x, \\d x — d*\\ < e. 

Proof. Given n, we construct d 6 A j m k that is of the form d x and is close 
to d* in a number of steps. As a first step, for all t and i, let d\ (t, i) be the result 
of rounding d*(t, i) to the nearest 1/hn (where ties are broken arbitrarily). The 
function d\ may not be in A^ mk ; we make minor adjustments to it to get a 
function in A^* mfc . First, note that we may have Ylidi(t,i) ^ ft- Since ft 
is a multiple of 1/hn, can get a function c?2 that satisfies these constraints by 
modifying each term d\(t,i) by either adding 1 jhn to it, subtracting 1/hn from 
it, or leaving it alone. Such a function di may still violate the final constraint 
that Y2t i id2(t, i) = m. We construct a function d% that satisfies this constraint 
(while continuing to satisfy the constraint that X^^3(M) = ft) as follows. 
Note that if we increase d%(t, i) by 1/hn and decrease d2(t,j) by 1/hn, then 
we keep the keep ^ di{t, i) = ft, and change ^ ic^i, i) by (i — j)/hn. Since 
each term d2(t,i) is a multiple of l//m and m is a multiple of l//t, we can 
perform these adjustments until all the constraints are satisfied. 

The rounding to create d\ changed each d\(t, i) by at most 1/hn, so \\d* — 
di\\i < (Ylt + l)/hn. Since, each term d\(t,i) was changed by at most 1/hn 
to obtain ^(i, i), we have \\d\ —d^Wi < (Ylt kt + l)/hn. Let c = max i (max(A; t — 
m,m)). Each movement of up to 1/hn in the creation of d\ and <i2 altered m 
by at most c/hn. Thus at most 2c movements are needed in the creation of d% 
for each pair (t,i). Therefore, \\d<2 — d^\\x < i^2 t k t + l)2c/hn. By the triangle 
inequality, ||<i* — <^3 1 1 < (X^t^t + l)(2c+ 2) //in, which is 0(l/n). Hence, for 
n e sufficiently large, the resulting CZ3 will always be within distance e of d*. 

Finally, we need to show that = d x for some x. Each d$(t, i) is a multiple 
of 1/hn. There are hn agents in total, so we can find such an x by taking any 
allocation of money such that d^{t,i)hn agents of type t have i dollars, d 

We are now ready to prove Theorem 14.11 We repeat the statement here 
for the reader's convenience. 

Theorem 14. 1[ For all games (T, f, h, m, 1), all vectors k of thresholds, and 
all e > 0, there exist q S A ^ m g and n £ such that, for all n > n e , there exists 
a round r* such that, for all r > r* , we have Pr(iT n£ = 1) > 1 — e. 

Proof. From Lemma lA.31 we know that, after a sufficient amount of time, 
the probability of being in state x will be close to tt$ = wg/Z. Since A4 
converges to a limit distribution, it is sufficient to show that the theorem 
holds in the limit as r — > 00. If the theorem holds in the limit for some 
e' < e, then we can take r large enough that the LI distance between the 
distribution of the chain at time r and the limit distribution (i.e. treating the 
distributions as vectors and computing the sum of the absolute values of their 
differences) is at most e — e' . 

The remainder of the proof is essentially that of Theorem 3.13 in [IT] 
(applied in a very different setting). Let V(d) = H(d) — H(f) — \ogZ + 
J2t Sj=o id(t, i) log 0Jt- We show there exists a value vl such that, for all states 
x such that d x is not within e of d* we have V(d x ) < vl, and a value vjj > vl 
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such that vh = V{d y ) for some point y such that d v is within distance e of 
d*. Lemma IA.4I then shows that it is exponentially more likely that cP' r = d v 
than any distribution d such that V(d) < vl- If x 1 " = y then F nE = 1, and if 
Iq,n,e = then V(d xT ) < vl, so this suffices to establish the theorem. 

By Lemma [A. 5 1 the unique maximum of V on A ^ m g occurs at d*. The set 
{c£ G A^> m g | — d|| 2 > e} is closed. V is a continuous function, so it takes 
some maximum vl on this set. Pick some vjj such that vl < vh < 
By the continuity of V, there exists an e such that if \\dg — d\\± < e then 
V(d) > v//. By Lemma lA.61 for sufficiently large ra, there is always some x 
such that — d x \\i < e. Thus, for some x G -Xe, g , Vfrf") > 

Pr(IJ n))e = 1) > Pr(x r G {y \ dP = d 3 }). By Lemma EH Pr(iJ ifl>e = 
1) is at least l/F(hn)e hnV ^ > l/F(hn)e hnVH . Now consider a y such 
that I qne (y) = 0. By Lemma IA.41 the probability that d xT = d y is at 
most G(hn)e hnV ^ < G(hn)e hnVL . There are at most (Jin + l)E t C=t+i) guch 
points, a number which is polynomial in hn. Thus, for G'(hn) = G(hn)(hn + 
l)E t (fe+i) ) the probability that F qne = is at most G'(hn)e hnVL . The ratio of 
these probabilities is at most 

G'{hn)e hnVL _ G'{hn)F{hn) 

1 r hnvw P hn(v H ~v L ) 
F{hn)^ e 

This is the ratio of a polynomial to an exponential, so the probability of 
seeing a distribution of distance greater than e from d* goes to zero as n goes 
to infinity. |H 

Appendix B. Proofs from Section [5l 

In this appendix, we provide the omitted proofs from Section 
The proof of Theorem 15 . 2 1 relies on modeling the game from the perspective 
of a single agent. Consider a vector k of thresholds and the corresponding 
strategy profile S(k). Fix an agent i of type t. Assume that all the agents 
other than i continue play their part of S(k). What is i's best response? Since 
the set of agents is large, i's choice of strategy will have (essentially) no impact 
on the distribution of money. By Theorem 14. 1\ the distribution of money will 
almost always be close to a distribution d*. Suppose, the distribution were 
exactly d* . Since we know the exact distribution of money and the thresholds 
used by the other agents, we can calculate the number of each type of agent 
that wish to volunteer and thus the probabilities that our single agent will be 
able to earn or spend a dollar. Thus, by assuming the distribution of money 
is always exactly d* , we can model the game from the perspective of agent i 
as a Markov Decision Process (MDP). We show in Lemma lB.21 that this MDP 
has an optimal threshold policy. (Threshold policies are known as monotone 
policies in the more general setting where there are more than two actions.) 
We then prove that any optimal policy for the MDP is an e-best reply to the 
strategies of the other agents in the actual game. 
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Taking notation from Puterman [37], we formally define the MDP 
^GS{k)t = (^)Ap(" I s > a )> r ( s i a )) that describes the game where all the 
agents other than i are playing 5(fc)_j and i has type t. 

• S = {0, . . . ,mhn} is the set of possible states for the MDP (i.e., the 
possible amounts of money compatible with the distribution d* ) . 

• A = {0, 1} is the set of possible actions for the agent, where denotes 
not volunteering and 1 denotes volunteering iff another agent who has 
at least one dollar makes a request. 

• p u is the probability of earning a dollar, assuming the agent volunteers 
(given that all other agents have fixed their thresholds according to 
k and the distribution of money is exactly d*). Each agent of type 
t' who wishes to volunteer can do so with probability f3 t i. Assuming 
exactly the expected number of agents are able to volunteer, Vf = 
fit' (ft' ~ d*(t',k t i))n agents of type t' volunteer. Note that we are 
disregarding the effect of i in computing the v f , since this will have a 
negligible effect for large n. Using the vts, we can express p u as the 
product of two probabilities: that some agent other than i who has a 
dollar is chosen to make a request and that i is the agent chosen to 
satisfy it. Thus, 




(B.l) 



• pa is the probability of agent i having a request satisfied, given that 
agent i has a dollar. Given that all agents are playing a threshold 
strategy, if the total number n of agents is sufficiently large, then it is 
almost certainly the case that some agent will always be willing and 
able to volunteer. Thus, we can take pd to be the probability that 
agent i will be chosen to make a request; that is, 

Pd = ~ (B.2) 

n 

• r(s,a) is the (immediate) expected reward for performing action a in 
state s. Thus, r(s, 0) = jtPd if s > 0; r(0, 0) = 0; r(s, 1) = j t Pd ~ atPu 
if s > 0; and r(0, 1) = —ottp u - 

• p(s' | s, a) is the probability of being in state s' after performing action 
a in state s; p(s' \ s, a) is determined by p u and pd\ specifically, p(s+l \ 
s, 1) = p u , p(s — 1 | s, a) = pd if s > 0, and the remainder of the 
probability is on p(s \ s,a) (i.e., p(s \s,a) = 1 — (p(s+l \ s, l)+p(s — 1 | 
s, a)). 

• u*(s) is the expected utility of being in state s if agent i uses the 
optimal policy for the MDP V G <^ t 

• u(s,a) is the expected utility for performing action a in state s, given 
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that the optimal strategy is followed after this action; 

mhn 

u(s, a) = r(s, a) + 5 p{s ! \ s, a)u*(s'). 

s'=0 

To prove Theorem 15,21 we need two preliminary lemmas about the MDP 

1 G,S(k),f 

Lemma B.l. For the MDP V G g,* t , u*(s + 2) + u*(s) < 2u*(s + 1). 

Proof. The MDP V G t has an optimal stationary policy |37} Theorem 
6.2.10] (a policy where the chosen action depends only on the current state). 
Let tt be such a policy. Consider the policy tt 1 starting in state s + 1 that 
"pretends" it actually started in state s and is following tt. More precisely, if 
so = s + 1 and Sj > for j = 0, . . . , k, define tt'(so, si, . . . , Sk) = n{sk — 1); 
otherwise, if j < k is the least index such that sj = 0, define vr'(s , • • • , Sfc) = 
7r(sfc). Given a history (sq, . . . , s^), j is the random variable whose value is 
the minimum i such that Si = or oo if no such value exists. The definition 
of tt' from tt creates a bijection between histories that start in state s + 1 and 
histories that start in state s, such that if h! corresponds to h, the probability 
of history h! with policy tt' is the same as the probability of h with policy tt. 
Technically, making the mapping a bijection requires the introduction of a new 
state 0', which intuitively represents the state where the agent has zero dollars 
and missed an opportunity to have a request satisfied last round because of 
it. More formally, we let p(0' | 0, a) = pd and p(s \ 0',a) = p(s \ 0, a). With 
this change, the probabilities of corresponding histories are the same because 
the probability of transitioning from a state to the one "immediately below" 
it (where s — 1 is immediately below s, 0' is immediately below 0, and 0' is 
immediately below itself) is always pd, and the probability of transitioning 
from a state to the one "immediately above" it (where s + 1 is immediately 
above s, and 1 is immediately above 0') is always p u ® 

This argument shows that an agent starting with s + 1 dollars "pretending" 
to start with s will have the same expected reward each round as an agent 
who actually started with s dollars, except during the first round j in a history 
such that Sj = 0. Thus (treating j as a random variable), we have 

u*(s + l) >u*(s) + E[5^ t ]. 

Similarly, we can use tt starting from state s + 2 to define a policy tt" 
starting from state s + where i "pretends" he has one more dollar and is 
using 7r, up to the first round j' that he is chosen to make a request with tt in 

6 Note that this means that 0' is immediately below but 1 is immediately above 0'. This 
is intended, because 0' intuitively represents the state where the agent has dollars and had 
a request go unsatisfied due to a lack of money in the previous round, so if he then earns a 
dollar he will have 1 dollar regardless of whether or not his request of two rounds previous 
was satisfied. 
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a state where he has no money (in which case he can make the request with 
tt started from s + 2, but cannot make it with tt" started from s + 1); from 
that point on, he uses tt. For corresponding histories, the utilities of an agent 
starting with s + 1 dollars and following tt" and an agent starting with s + 2 
dollars and following tt will be the same, except during round j' the agent 
following tt will have a request satisfied but the agent following tt" will not. 
Thus, 

u*{s + l) > u*(s + 2)-E[S j ' lt \. 

Since, if i uses tt, he will run out of money sooner if he starts with s dollars 
than if he starts with s + 2 dollars, 

E[5>yt] > E[p'j t ]- 

Thus, u*{s + 2) + u*(s) < 2u*(s + 1). C 

Lemma B.2. V G gn\ t has an optimal threshold policy. 

Proof. As shown by Puterman |37t Lemma 4.7.1], it suffices to prove that 
u(s,a) is subadditive. That is, we need to prove that, for all states s, 

u(s + 1,1) + u{s, 0) < u(s + 1,0)+ u(s, 1). (B.3) 

We consider here only the case that s > (the argument is essentially the 
same if s = 0). Because s > 0, r(s + 1, o) = r(s, a), so (|B.3P is equivalent to 

Pu u*{s + 2) + Pd u*{s) + {1- Pu - Pd )u*{s + 1) + p d u*{s - 1) + (1 - p d )u*(s) 
< p d u*(s) + (l-p d )u*(s + 1) +p u u*(s + 1) +p d u*(s - 1) + (l-Pu -Pd)u*(s). 

This simplifies to 

u*(s + 2) +u*{s) < 2u*(s + l), 

which follows from Lemma |B. II d 

We can now prove Lemma 15.11 and Theorem 15. 2[ 
Lemma 15.11 Consider the games G n = (T, f , h,m,n) (where T, f, h, and 
m are fixed, but n may vary). There exists a k such that for all n, Sj~ is an 
optimal policy for V G g, -a t . The threshold k is the maximum value of k such 
that 

a t < E[(l - (1 - 5 t )/n) J ^^] lu {5+]) 

where J(K,p u ,p d ) is a random variable whose value is the first round in which 
an agent starting with k dollars, using strategy s K , and with probabilities p u and 
p d of earning a dollar and of being chosen given that he volunteers, respectively, 
runs out of money. 

Proof. Fix n. Suppose that an agent is choosing between a threshold of 
k and a threshold of k + 1. These policies only differ when the agent has 
k dollars: he will volunteer with the latter but not with the former. If he 
volunteers when he has k dollars and is chosen, he will pay a cost of at and he 
will have k + 1 dollars. As in the proof of Lemma [B.l\ we can define a bijection 
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on histories such that, in corresponding histories of equal probability, an agent 
who started with k dollars and is using s K will always have one less dollar than 
an agent who started with k + 1 dollars and is using s K+ ±, until the first round 
r in which the agent using has zero dollars. This means that in round 
r — 1 the agent using had a request satisfied but the agent using Sk was 
unable to because he had no money. Thus, if the agent volunteers when he 
has k dollars and pays a cost of at in the current round, the expected value of 
being able to spend that dollar in the future is E[(l — (1 — 5t) /n) J ( K+1 ' Pu ' Pd ^]j t ■ 
Since this expectation is strictly increasing in k (an agent with more money 
takes longer to spend it all), the maximum k such that Equation (15. ip holds 
is an optimal threshold policy. 

Taking the maximum value of k that satisfies Equation (15. li ensures that, 
for the n we fixed, we chose the maximum optimal threshold. We now need 
to show that this maximum optimal threshold is independent of n, which 
we do by showing that the expecting utility of every threshold policy Sk is 
independent of n. The expected utility of a policy depends on the initial 
amount of money, but since an agent's current amount of money is a random 
walk whose transition probabilities are determined by p u and pd, there is a 
well-defined limit probability 

x* = lim Pr( agent has i dollars in round r) 

r— >oo 

determined by the ratio p u /Pd (this is because the limit distribution satisfies 
the detailed balance condition: x*p u = x* +l pd)- This distribution has the 
property that if the agent starts with i dollars with probability x*, then in every 
round the probability he has i dollars is x*. Thus, in each round his expected 
utility is jpd(l — XQ) — ap u (l — x^). We can factor out n to write p u = p' u /n and 
Pd = p'Jn where p' u and p' d are independent of n. Note that p u /Pd = Pu/p'd> 80 
the x*'s are independent of n. Thus, we can rewrite the agent's expected utility 
for each round as c/n, where c = 7P^(1 — Xq) — cep' u (l — x* k ) is independent of 
n. Therefore, the expected utility of is 

n J n l-6t' 

which is independent of n. H 

Theorem 15. 21 For all games G = (T, f, h, m, n), all vectors k of thresholds, 
and all e > 0, there exist n* and 6* n such that for all n > n*, types t £ T, 
and 5t > S* jU , an optimal threshold policy for V G t is an e-best reply to the 

strategy profile S(k)-i for every agent i of type t. 

Proof. By Lemma [B. 2\ V G £q\ t has an optimal threshold policy. However, 
this might not be a best reply for agent i in the actual game if the other 
agents are playing S(k). V G t assumes that the probabilities of earning or 
spending a dollar in a given round are always exactly p u and pd respectively. 
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Theorem 14 . 1 1 guarantees only that, in the game, the corresponding probabilities 
are close to p u and pd with high probability after some amount of time that 
can depend on n. A strategy S for player i in G defines a policy its for V G g,^ t 
in the obvious way; similarly, a policy for the MDP determines a strategy for 
player i in the game. The expected utility of its is close to Ui(S, S(k)-i), but 
is, in general, not equal to it, because, as we noted, p u and pd may differ from 
the corresponding probabilities in the game. They differ for three reasons: (1) 
they are close, but not identical; (2) they are only close with high probability, 
and (3) they are only close after some amount of time. As we now show, given 
e, the difference in the expected utility due to each reason can be bounded 
by e/6, so the expected utility of any strategy is within s/2 of the value the 
corresponding policy in ^Q§n\ t - Thus, an optimal strategy for the MDP is 
an e-best reply. 

As we have seen, the probabilities p u and pd are determined by the number 
of agents of each type that volunteer (i.e., the expressions Vf for each type 
t'). The distance between d x and d* bounds how much the actual number of 
agents of type t' that wish to volunteer in round r can differ from Vf / fit' ■ Even 
if exactly Vt' I fit' agents wish to volunteer for each type t', there might not be 
exactly Vf agents who actually volunteer because of the stochastic decision by 
nature about who can volunteer and because i cannot satisfy his own requests. 
However, for sufficiently large n, the effect on p u and pd from these two factors 
is arbitrarily close to zero. Applying Theorem 14.11 there exist n\ and r\ 
such that if there are at least rt\ agents, for all round r > n, d^ and d* are 
sufficiently close that the difference between the utility of policy its' in the 
MDP and Ui((S' , S-i) in rounds r > r\ where d* is sufficiently close is at most 
e/6. 

Note that the maximum possible difference in utility between a round of 
the MDP and a round of the game is 7 + a (if agent i spends a dollar rather 
than earning one). Applying Theorem 14.11 again, for e = 5/6(7 + a )> there 
exist ri2 and r2 such that the probability of the distribution not being within e 
of d* is less than e. Thus, the difference between the expected utility of policy 
TTgi in the MDP and Ui((S', S-i) in rounds r > ri where d* is not sufficiently 
close is at most e(7 + a) = e/6. 

Let n* = max(ni,7i2) and r* = max(rx,?"2). The values of n* and r* do 
not depend on 5, so we can take 5* n to be sufficiently close to 1 that the total 
utility from the first r* rounds is at most e/6, completing the proof of the 
theorem, d 

Recall that BRq maps a vector k describing the threshold strategy for 
each type to a vector k' of best replies. 

Lemma 15 . 31 Consider the family of games G m = (T, f, h, m, n) and the 
strategies S(k), for mhn < ftkthn. For this family of game, X m t is non- 
decreasing in m and non-increasing in k; p™' k is non- decreasing in m and 
non-increasing in k; and the function BRq is non- decreasing in k and non- 
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increasing in m. 

Proof. We first show that that A ? is monotone in m and k. We then 

J m,k 

show that p™ ,k is a monotone function of A r and that BRq is a monotone 



function of p™ ,k , completing the proof. 



We now show that A ? is non-decreasing in m. Fix a vector of thresholds 

m,k ° 

fc and let 

gg(A) = ^^ -| (B.4) 

where is the value of g from Equation (jA.ip (we add the subscript k to stress 
the dependence on k). The definition of X m g in Equations (j4.ll) and (|4.2f) in 
Lemma [4. 21 ensures that, for all m, m = gt{\ r)- A relatively straightforward 
computation shows that <?j~(A) > for all A. Thus, if m! > m, 5g(A) = m, 
and ^(A') = m', we must have A' > A. It follows that A m £ is increasing in 
m. (Note that A m r is undefined for m > ^ t ftkt, which is why monotonicity 
holds only for values of m such that mhn < ftkt-) 

We next show that A r is non- increasing in k. Since we have a finite set 
of types, it suffices to consider the case where a single type t* increases its 
threshold by 1. Let k denote the initial vector of thresholds, and let k' denote 
the vector of thresholds after agents of type t* increase their threshold by 1; 
that is, k t = k' t for t / t*, and k' t * = kt* + 1. 

The first step in showing that A m g is non-increasing in k is to show that 
#£,(A m r) > 5g(A m r) = m. We do this by breaking the sum in the definition 
of g in Equation (IB.4j) into two pieces; those terms where t ^ t*, and those 
where t = t*. 

It follows immediately from Equation (|A.ip that there exists a constant 
c such that, for all i and t ^ t* , we have qg{t,i) = cqAt,i). It follows from 
Equation (|4.ip that for all i and t ^ t*, since kt = k' t , we have 



^,(M) . /*A* ^(t,*) _ /tA* ^(t,*) 



(B.5) 



that is, the corresponding terms in the sum for <to(A r) and 5^(A m are the 
same if t ^ t*. 

Now consider the corresponding terms for type t*. First observe that for 
all i <k[, 

fr^gM'V) , . (B6) 

the two terms have essentially the same numerator (the use of q^, instead of 
g£ cancels out as in Equation (|B.5|) ). but the first has a larger denominator 
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because k[* = kf* + 1, so there is one more term in the sum. Since ft* = 
Eto^(iV) = E?io^£,(*V)> W Equations gH) and (Q, 



It follows that 



. m,k K . m,k K 



l — n > 



i=o E&a^, (*,.?) ^ E^A^(t,j) 

To see this, note that the two expressions above have the form Eil*o +1 anc ^ 
Ei=o^i> respe ctively. By Equation (IB. 7ft . Ef=o +1 Q = Ei=o^ = h*\ b y 
Equation ()B.6|) . Cj < cZj for i = 0, . . . , kt* ■ Thus, in going from the right side 
to the left side, weight is being transferred from lower terms to kt* +1. 

Combining Equations (|B.5j) and (jB.8j) gives us gp{^ m %) > 9%{^ m $) = m > 
as desired. Since 3d(A &) = m, by definition, it follows that <?d(A jr) > 
g£i(\ ti)- Since, as shown above, gg, is an increasing function, it follows that 
A r > A r,- Thus, A ? is decreasing in fc. 

m,/e m,fe' ' m,k ° 

We now show that the monotonicity of A m g implies the monotonicity of 

p™ ,k . To do this, we show that, for all types t, p™ ,k = Pd^ m z^t- Since uj t and 
Pd are independent of m and k, it then follows that the monotonicity of A m r 

implies the monotonicity of p™' k . (Recall that cot = fitXt/pt was defined in 
Section O) 

Fix a type t'. Then, dropping superscripts and subscripts on p u , d, and A 
for brevity, we have the following sequence of equalities (where the explanation 
for some of these lines is given following the equations): 

P ,H£M/,-#,.))( „ Eix ;f.#,M) ) (R9) 



n 



n 



(B.10) 
(B.ll) 



/) 



A m i / n (B.12) 
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Equation (|B.9[) is just the definition of p u from Equation (jB.ip , Equa- 
tion (jB.lOp follows from the observation that, by Equation (|4.ip . ft = 
Y2id(t,i)- Equation (jB.lip follows from the observation that, again by Equa- 
tion (|4.ip . d{t, i) = cotXd(t, i — l). Equation (|B.12p follows from the definitions 

of <jj t and pd (see Equation (|B.2p ), Thus, as required, p™ ,k = pd\ m j:U!t- 

Finally, we show that the monotonicity of p™ ,k implies the monotonicity 
of BRq- Let k" = BRc{k). By Lemma 15.14 k' t ' is the maximum value of k 
such that 

at < E[(l - (1 - 5 t )/n) J ^' k ^] lt . 

We (implicitly) defined the random variable J(n,p u ,pd) as a function on his- 
tories. Instead, we can define J(n,p u ,Pd) &s a function on random bitstrings 
(which intuitively determine a history). With this redefinition, it is clear that, 
if p u < p' u , for all bitstrings b, we have J(K,p u ,pd)(b) < J(K,p' u ,pd)(b). It 
easily follows that 

E[(l - (1 - St) J ( K 'P'-'^)] < E[(l - (1 - S t ) j ^p^] 
for all k. Thus, the monotonicity of BRq follows from the monotonicity of 

m,k m i 

Pu ■ E 

Lemma 15.41 For all games G = (T, f,h,m,n), there exists a 5* < 1 such 
that if 5t > 5* for all t, there is a vector k of thresholds such that BRc(k) > k. 

Proof. Take k to be such that kt = \m\ + 1 for each type t. Then by 
Theorem 15.21 there exists a k! such that BRc{k) = k! . By Lemma 15.14 k' t is 
the maximum value of k such that 

at < E[(l - (1 - 5 t )/n) J ^^] lt . Q> 

As St approaches 1, E[(l — (1 — S t )/n) J ^' p ^ ,Pd ^] approaches 1, and so the right 
hand side of Equation (|5.ip approaches jf For any standard agent, at < 7t- 
Thus, there exists a St such that 

at < E[(l - (1 - S t )/n) J{kt ^^} lt . 
For this choice of St , we must have k' t > k t + 1 > k t . Take S* = max t St ■ 13 
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